diff --git a/docs/modules/agent/architecture.md b/docs/modules/agent/architecture.md index 75e58b7b8f8..b36942560bc 100644 --- a/docs/modules/agent/architecture.md +++ b/docs/modules/agent/architecture.md @@ -2,19 +2,19 @@ title: Architecture --- -## 1. Overview of InLong-Agent +## 1 Overview of InLong-Agent InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including file, sql, Binlog, metrics, etc. -### The brief architecture diagram is as follows: +### 1.1 The brief architecture diagram is as follows: ![](img/architecture.png) -### design concept +### 1.2 design concept In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading. -### Current status of use +### 1.3 Current status of use InLong-Agent is widely used within the Tencent Group, undertaking most of the data collection business, and the amount of online data reaches tens of billions. -## 2. InLong-Agent architecture +## 2 InLong-Agent architecture The InLong Agent task is used as a data acquisition framework, constructed with a channel + plug-in architecture. Read and write the data source into a reader/writer plug-in, and then into the entire framework. + Reader: Reader is the data collection module, responsible for collecting data from the data source and sending the data to the channel. @@ -22,7 +22,7 @@ The InLong Agent task is used as a data acquisition framework, constructed with + Channel: The channel used to connect the reader and writer, and as the data transmission channel of the connection, which realizes the function of data reading and monitoring -## 3. Different kinds of agent +## 3 Different kinds of agent ### 3.1 file agent File collection includes the following functions: diff --git a/docs/modules/agent/quick_start.md b/docs/modules/agent/quick_start.md index 7e567c27f58..04ef7c2f208 100644 --- a/docs/modules/agent/quick_start.md +++ b/docs/modules/agent/quick_start.md @@ -2,7 +2,7 @@ title: Build && Deployment --- -## 1、Configuration +## 1 Configuration ``` cd inlong-agent ``` @@ -10,7 +10,7 @@ cd inlong-agent The agent supports two modes of operation: local operation and online operation -### Agent configuration +### 1.1 Agent configuration Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows: ```ini @@ -20,7 +20,7 @@ agent.manager.vip.http.host=manager web host agent.manager.vip.http.port=manager web port ``` -## 2、run +## 2 run After decompression, run the following command ```bash @@ -28,9 +28,9 @@ sh agent.sh start ``` -## 3、Add job configuration in real time +## 3 Add job configuration in real time -#### 3.1 agent.properties Modify the following two places +### 3.1 agent.properties Modify the following two places ```ini # whether enable http service agent.http.enable=true @@ -38,7 +38,7 @@ agent.http.enable=true agent.http.port=Available ports ``` -#### 3.2 Execute the following command +### 3.2 Execute the following command ```bash curl --location --request POST 'http://localhost:8008/config/job' \ --header 'Content-Type: application/json' \ @@ -78,7 +78,7 @@ agent.http.port=Available ports - proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager -## 4、eg for directory config +## 4 eg for directory config E.g: /data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder @@ -87,7 +87,7 @@ agent.http.port=Available ports /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21" -## 5. Support to get data time from file name +## 5 Support to get data time from file name Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows: /data/inlong-agent/***YYYYMMDDHH*** @@ -143,7 +143,7 @@ curl --location --request POST'http://localhost:8008/config/job' \ }' ``` -## 6. Support time offset reading +## 6 Support time offset reading After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour diff --git a/docs/modules/dataproxy-sdk/architecture.md b/docs/modules/dataproxy-sdk/architecture.md index 591163cd05d..42bc4bb181e 100644 --- a/docs/modules/dataproxy-sdk/architecture.md +++ b/docs/modules/dataproxy-sdk/architecture.md @@ -1,16 +1,16 @@ --- title: Architecture --- -# 1、intro +## 1 intro When the business uses the message access method, the business generally only needs to format the data in a proxy-recognizable format (such as six-segment protocol, digital protocol, etc.) After group packet transmission, data can be connected to inlong. But in order to ensure data reliability, load balancing, and dynamic update of the proxy list and other security features The user program needs to consider more and ultimately leads to the program being too cumbersome and bloated. The original intention of API design is to simplify user access and assume some reliability-related logic. After the user integrates the API in the service delivery program, the data can be sent to the proxy without worrying about the grouping format, load balancing and other logic. -# 2、functions +## 2 functions -## 2.1 overall functions +### 2.1 overall functions | function | description | | ---- | ---- | @@ -22,9 +22,9 @@ The original intention of API design is to simplify user access and assume some | proxy list persistence (new)| Persist the proxy list according to the business group id to prevent the configuration center from failing to send data when the program starts -## 2.2 Data transmission function description +### 2.2 Data transmission function description -### Synchronous batch function +#### Synchronous batch function public SendResult sendMessage(List bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit) @@ -32,7 +32,7 @@ The original intention of API design is to simplify user access and assume some bodyListIt is a collection of multiple pieces of data that users need to send. The total length is recommended to be less than 512k. groupId represents the service id, and streamId represents the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout & timeUnit: These two parameters are used to set the timeout time for sending data, and it is generally recommended to set it to 20s. -### Synchronize a single function +#### Synchronize a single function public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit) @@ -41,7 +41,7 @@ The original intention of API design is to simplify user access and assume some body is the content of a single piece of data that the user wants to send, and the meaning of the remaining parameters is basically the same as the batch sending interface. -### Asynchronous batch function +#### Asynchronous batch function public void asyncSendMessage(SendMessageCallback callback, List bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit) @@ -50,7 +50,7 @@ The original intention of API design is to simplify user access and assume some SendMessageCallback is a callback for processing messages. The bodyList is a collection of multiple pieces of data that users need to send. The total length of multiple pieces of data is recommended to be less than 512k. groupId is the service id, and streamId is the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout and timeUnit are the timeout time for sending data, generally recommended to be set to 20s. -### Asynchronous single function +#### Asynchronous single function public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit) diff --git a/docs/modules/dataproxy/architecture.md b/docs/modules/dataproxy/architecture.md index de2a89a13ac..a897a7a8c51 100644 --- a/docs/modules/dataproxy/architecture.md +++ b/docs/modules/dataproxy/architecture.md @@ -1,14 +1,14 @@ --- title: Architecture --- -# 1、intro +## 1 intro Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system. -# 2、architecture +## 2 architecture ![](img/architecture.png) @@ -16,7 +16,7 @@ title: Architecture 2. The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed. 3. The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here) -# 3、DataProxy support configuration instructions +## 3 DataProxy support configuration instructions DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume: @@ -158,7 +158,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000 Maximum number of caches ``` -# 4、Monitor metrics configuration instructions +## 4 Monitor metrics configuration instructions DataProxy provide monitor indicator based on JMX, user can implement the code that read the metrics and report to user-defined monitor system. Source-module and Sink-module can add monitor metric class that is the subclass of org.apache.inlong.commons.config.metrics.MetricItemSet, and register it to MBeanServer. User-defined plugin can get module metric with JMX, and report metric data to different monitor system. diff --git a/docs/modules/dataproxy/quick_start.md b/docs/modules/dataproxy/quick_start.md index e8bcc697ae7..55b93981dd1 100644 --- a/docs/modules/dataproxy/quick_start.md +++ b/docs/modules/dataproxy/quick_start.md @@ -1,11 +1,11 @@ --- title: Build && Deployment --- -## Deploy DataProxy +## 1 Deploy DataProxy All deploying files at `inlong-dataproxy` directory. -### config TubeMQ master +### 1.1 config TubeMQ master `tubemq_master_list` is the rpc address of TubeMQ Master. ``` @@ -14,13 +14,13 @@ $ sed -i 's/TUBE_LIST/tubemq_master_list/g' conf/flume.conf notice that conf/flume.conf FLUME_HOME is proxy the directory for proxy inner data -### Environmental preparation +### 1.2 Environmental preparation ``` sh prepare_env.sh ``` -### config manager web url +### 1.3 config manager web url configuration file: `conf/common.properties`: ``` @@ -28,19 +28,19 @@ configuration file: `conf/common.properties`: manager_hosts=ip:port ``` -## run +## 2 run ``` sh bin/start.sh ``` -## check +## 3 check ``` telnet 127.0.0.1 46801 ``` -## Add DataProxy configuration to InLong-Manager +## 4 Add DataProxy configuration to InLong-Manager After installing the DataProxy, you need to insert the IP and port of the DataProxy service is located into the backend database of InLong-Manager. diff --git a/docs/modules/manager/architecture.md b/docs/modules/manager/architecture.md index 84256a96c35..d82ff16257b 100644 --- a/docs/modules/manager/architecture.md +++ b/docs/modules/manager/architecture.md @@ -2,19 +2,19 @@ title: Architecture --- -## Introduction to Apache InLong Manager +## 1 Introduction to Apache InLong Manager + Target positioning: Apache inlong is positioned as a one-stop data access solution, providing complete coverage of big data access scenarios from data collection, transmission, sorting, and technical capabilities. + Platform value: Users can complete task configuration, management, and indicator monitoring through the platform's built-in management and configuration platform. At the same time, the platform provides SPI extension points in the main links of the process to implement custom logic as needed. Ensure stable and efficient functions while lowering the threshold for platform use. + Apache InLong Manager is the user-oriented unified UI of the entire data access platform. After the user logs in, it will provide different function permissions and data permissions according to the corresponding role. The page provides maintenance portals for the platform's basic clusters (such as mq, sorting), and you can view basic maintenance information and capacity planning adjustments at any time. At the same time, business users can complete the creation, modification and maintenance of data access tasks, and index viewing and reconciliation functions. The corresponding background service will interact with the underlying modules when users create and start tasks, and deliver the tasks that each module needs to perform in a reasonable way. Play the role of coordinating the execution process of the serial back-end business. -## Architecture +## 2 Architecture ![](img/inlong-manager.png) -##Module division of labor +## 3 Module division of labor | Module | Responsibilities | | :----| :---- | @@ -24,9 +24,9 @@ title: Architecture | manager-web | Front-end interactive response interface | | manager-workflow-engine | Workflow Engine | -## use process +## 4 use process ![](img/interactive.jpg) -## data model +## 5 data model ![](img/datamodel.jpg) \ No newline at end of file diff --git a/docs/modules/manager/quick_start.md b/docs/modules/manager/quick_start.md index 6f617549bfc..3d32971d2cb 100644 --- a/docs/modules/manager/quick_start.md +++ b/docs/modules/manager/quick_start.md @@ -2,7 +2,7 @@ title: Build && Deployment --- -# 1. Environmental preparation +## 1 Environmental preparation - Install and start MySQL 5.7+, copy the `doc/sql/apache_inlong_manager.sql` file in the inlong-manager module to the server where the MySQL database is located (for example, copy to `/data/` directory), load this file through the following command to complete the initialization of the table structure and basic data: @@ -25,15 +25,15 @@ title: Build && Deployment to [Compile and deploy TubeMQ Manager](https://inlong.apache.org/zh-cn/docs/modules/tubemq/tubemq-manager/quick_start.html) , install and start TubeManager. -# 2. Deploy and start manager-web +## 2 Deploy and start manager-web **manager-web is a background service that interacts with the front-end page.** -## 2.1 Prepare installation files +### 2.1 Prepare installation files All installation files at `inlong-manager-web` directory. -## 2.2 Modify configuration +### 2.2 Modify configuration Go to the decompressed `inlong-manager-web` directory and modify the `conf/application.properties` file: @@ -74,7 +74,7 @@ The dev configuration is specified above, then modify the `conf/application-dev. sort.appName=inlong_app ``` -## 2.3 Start the service +### 2.3 Start the service Enter the decompressed directory, execute `sh bin/startup.sh` to start the service, and check the log `tailf log/manager-web.log`. If a log similar to the following appears, the service has started successfully: @@ -83,7 +83,7 @@ log `tailf log/manager-web.log`. If a log similar to the following appears, the Started InLongWebApplication in 6.795 seconds (JVM running for 7.565) ``` -# 3. Service access verification +## 3 Service access verification Verify the manager-web service: diff --git a/docs/modules/sort/introduction.md b/docs/modules/sort/introduction.md index a215522626c..7e9f6b03f74 100644 --- a/docs/modules/sort/introduction.md +++ b/docs/modules/sort/introduction.md @@ -7,31 +7,31 @@ Inlong-sort is used to extract data from different source systems, then transfor Inlong-sort is simply an Flink application, and relys on Inlong-manager to manage meta data(such as the source informations and storage informations) # features -## multi-tenancy +## 1 multi-tenancy Inlong-sort is an multi-tenancy system, which means you can extract data from different sources(these sources must be of the same source type) and load data into different sinks(these sinks must be of the same storage type). e.g. you can extract data form different topics of inlong-tubemq and the load them to different hive clusters. -## change meta data without restart +## 2 change meta data without restart Inlong-sort uses zookeeper to manage its meta data, every time you change meta data on zk, inlong-sort application will be informed immediately. e.g if you want to change the schema of your data, just change the meta data on zk without restart your inlong-sort application. -# supported sources +## 3 supported sources - inlong-tubemq - pulsar -# supported storages +## 4 supported storages - clickhouse - hive (Currently we just support parquet file format) -# limitations +## 5 limitations Currently, we just support extracting specified fields in the stage of **Transform**. -# future plans -## More kinds of source systems +## 6 future plans +### 6.1 More kinds of source systems kafka and etc -## More kinds of storage systems +### 6.2 More kinds of storage systems Hbase, Elastic Search, and etc -## More kinds of file format in hive sink +### 6.3 More kinds of file format in hive sink sequence file, orc \ No newline at end of file diff --git a/docs/modules/sort/protocol_introduction.md b/docs/modules/sort/protocol_introduction.md index a04538f7cba..90eeb8d77f0 100644 --- a/docs/modules/sort/protocol_introduction.md +++ b/docs/modules/sort/protocol_introduction.md @@ -7,7 +7,7 @@ Currently the metadata management of inlong-sort relies on inlong-manager. Metadata interaction between inlong-sort and inlong-manager is performed via ZK. -# Zookeeper's path structure +## 1 Zookeeper's path structure ![img.png](img.png) @@ -20,6 +20,6 @@ A path at the top of the figure indicates which dataflow are running in a cluste The path below is used to store the details of the dataflow. -# Protocol +## 2 Protocol Please reference `org.apache.inlong.sort.protocol.DataFlowInfo` \ No newline at end of file diff --git a/docs/modules/sort/quick_start.md b/docs/modules/sort/quick_start.md index d2e46ebad3c..8823e0549ed 100644 --- a/docs/modules/sort/quick_start.md +++ b/docs/modules/sort/quick_start.md @@ -2,7 +2,7 @@ title: Build && Deployment --- -## Set up flink environment +## 1 Set up flink environment Currently inlong-sort is based on flink, before you run an inlong-sort application, you need to set up flink environment. @@ -12,10 +12,10 @@ Currently, inlong-sort relys on flink-1.9.3. Chose `flink-1.9.3-bin-scala_2.11.t Once your flink environment is set up, you can visit web ui of flink, whose address is stored in `/${your_flink_path}/conf/masters`. -## Prepare installation files +## 2 Prepare installation files All installation files at `inlong-sort` directory. -## Starting an inlong-sort application +## 3 Starting an inlong-sort application Now you can submit job to flink with the jar compiled. how to submit job to flink @@ -30,7 +30,7 @@ Notice: - `inlong-sort-core-1.0-SNAPSHOT.jar` is the compiled jar -## Necessary configurations +## 4 Necessary configurations - `--cluster-id ` which is used to represent a specified inlong-sort application - `--zookeeper.quorum` zk quorum - `--zookeeper.path.root` zk root path @@ -45,7 +45,7 @@ Configurations above are necessary, you can see full configurations in `--cluster-id my_application --zookeeper.quorum 192.127.0.1:2181 --zookeeper.path.root /zk_root --source.type tubemq --sink.type hive` -## All configurations +## 5 All configurations | name | necessary | default value |description | | ------------ | ------------ | ------------ | ------------ | |cluster-id | Y | NA | used to represent a specified inlong-sort application | diff --git a/docs/modules/tubemq/architecture.md b/docs/modules/tubemq/architecture.md index d81a6eeb9b9..cd4ad51e9b7 100644 --- a/docs/modules/tubemq/architecture.md +++ b/docs/modules/tubemq/architecture.md @@ -2,7 +2,7 @@ title: Architecture --- -## 1. TubeMQ Architecture: +## 1 TubeMQ Architecture: After years of evolution, the TubeMQ cluster is divided into the following 5 parts: ![](img/sys_structure.png) @@ -30,7 +30,7 @@ After years of evolution, the TubeMQ cluster is divided into the following 5 par - **ZooKeeper:** Responsible for the ZooKeeper part of the offset storage. This part of the function has been weakened to only the persistent storage of the offset. Considering the next multi-node copy function, this module is temporarily reserved; -## 2. Broker File Storage Scheme Improvement: +## 2 Broker File Storage Scheme Improvement: Systems that use disks as data persistence media are faced with various system performance problems caused by disk problems. The TubeMQ system is no exception, the performance improvement is largely to solve the problem of how to read, write and store message data. In this regard TubeMQ has made many improvements: storage instances is as the smallest Topic data management unit; each storage instance includes a file storage block and a memory cache block; each Topic can be assigned multiple storage instances. 1. **File storage block:** The disk storage solution of TubeMQ is similar to Kafka, but it is not the same, as shown in the following figure: each file storage block is composed of an index file and a data file; the partiton is a logical partition in the data file; each Topic maintains and manages the file storage block separately, the related mechanisms include the aging cycle, the number of partitions, whether it is readable and writable, etc. diff --git a/docs/modules/tubemq/tubemq-manager/quick_start.md b/docs/modules/tubemq/tubemq-manager/quick_start.md index cf27a7cddeb..bebcefb8c13 100644 --- a/docs/modules/tubemq/tubemq-manager/quick_start.md +++ b/docs/modules/tubemq/tubemq-manager/quick_start.md @@ -1,7 +1,7 @@ -## Deploy TubeMQ Manager +## 1 Deploy TubeMQ Manager All deploying files at `inlong-tubemq-manager` directory. -### configuration +### 1.1 configuration - create `tubemanager` and account in MySQL. - Add mysql information in conf/application.properties: @@ -12,13 +12,13 @@ spring.datasource.username=mysql_username spring.datasource.password=mysql_password ``` -### start service +### 1.2 start service ``` bash $ bin/start-manager.sh ``` -### register TubeMQ cluster +### 1.3 register TubeMQ cluster vim bin/init-tube-cluster.sh @@ -40,7 +40,7 @@ sh bin/init-tube-cluster.sh this will create a cluster with id = 1, note that this operation should not be executed repeatedly. -### Appendix: Other Operation interface +### 1.4 Appendix: Other Operation interface #### cluster Query full data of clusterId and clusterName (get) diff --git a/docs/modules/website/quick_start.md b/docs/modules/website/quick_start.md index 8eeaeda7107..a855ab29341 100644 --- a/docs/modules/website/quick_start.md +++ b/docs/modules/website/quick_start.md @@ -2,20 +2,20 @@ title: Build && Deployment --- -## About WebSite +## 1 About WebSite This is a website console for us to use the [Apache InLong incubator](https://github.com/apache/incubator-inlong). -## Build +## 2 Build ``` mvn package -DskipTests -Pdocker -pl inlong-website ``` -## Run +## 3 Run ``` docker run -d --name website -e MANAGER_API_ADDRESS=127.0.0.1:8083 -p 80:80 inlong/website ``` -## Guide For Developer +## 4 Guide For Developer You should check that `nodejs >= 12.0` is installed. In the project, you can run some built-in commands: @@ -33,14 +33,14 @@ The start of the web server depends on the back-end server `manger api` interfac You should start the backend server first, and then set the variable `target` in `/inlong-website/src/setupProxy.js` to the address of the api service. -### Test +### 4.1 Test Run `npm test` or `yarn test` Start the test runner in interactive observation mode. For more information, see the section on [Running Tests](https://create-react-app.dev/docs/running-tests/). -### Build +### 4.2 Build First, make sure that the project has run `npm install` or `yarn install` to install `node_modules`. diff --git a/docs/user_guide/example.md b/docs/user_guide/example.md index 140269b22c2..6c57d7b0490 100644 --- a/docs/user_guide/example.md +++ b/docs/user_guide/example.md @@ -5,17 +5,17 @@ sidebar_position: 3 Here we use a simple example to help you experience InLong by Docker. -## Install Hive +## 1 Install Hive Hive is the necessary component. If you don't have Hive in your machine, we recommand using Docker to install it. Details can be found [here](https://github.com/big-data-europe/docker-hive). > Note that if you use Docker, you need to add a port mapping `8020:8020`, because it's the port of HDFS DefaultFS, and we need to use it later. -## Install InLong +## 2 Install InLong Before we begin, we need to install InLong. Here we provide two ways: 1. Install InLong with Docker by according to the [instructions here](https://github.com/apache/incubator-inlong/tree/master/docker/docker-compose).(Recommanded) 2. Install InLong binary according to the [instructions here](./quick_start.md). -## Create a data access +## 3 Create a data access After deployment, we first enter the "Data Access" interface, click "Create an Access" in the upper right corner to create a new date access, and fill in the business information as shown in the figure below. Create Business @@ -38,12 +38,12 @@ Note that the target table does not need to be created in advance, as InLong Man Then we click the "Submit for Approval" button, the connection will be created successfully and enter the approval state. -## Approve the data access +## 4 Approve the data access Then we enter the "Approval Management" interface and click "My Approval" to approve the data access that we just applied for. At this point, the data access has been created successfully. We can see that the corresponding table has been created in Hive, and we can see that the corresponding topic has been created successfully in the management GUI of TubeMQ. -## Configure the agent +## 5 Configure the agent Here we use `docker exec` to enter the container of the agent and configure it. ``` $ docker exec -it agent sh diff --git a/docs/user_guide/quick_start.md b/docs/user_guide/quick_start.md index 483aa4641f9..05cd33207de 100644 --- a/docs/user_guide/quick_start.md +++ b/docs/user_guide/quick_start.md @@ -5,7 +5,7 @@ sidebar_position: 1 This section contains a quick start guide to help you get started with Apache InLong. -## Overall architecture +## 1 Overall architecture Apache InLong [Apache InLong](https://inlong.apache.org)(incubating) overall architecture is as above. This component is a one-stop data streaming platform that provides automated, secure, distributed, and efficient data publishing and subscription capabilities to help You can easily build stream-based data applications. @@ -15,7 +15,7 @@ InLong (应龙) is a divine beast in Chinese mythology who guides river into the InLong was originally built in Tencent and has served online business for more than 8 years. It supports massive data (over 40 trillion pieces of data per day) report services under big data scenarios. The entire platform integrates 5 modules including data collection, aggregation, caching, sorting and management modules. Through this system, the business only needs to provide data sources, data service quality, data landing clusters and data landing formats, that is, data can be continuous Push data from the source cluster to the target cluster, which greatly meets the data reporting service requirements in the business big data scenario. -## Compile +## 2 Compile - Java [JDK 8](https://adoptopenjdk.net/?variant=openjdk8) - Maven 3.6.1+ @@ -39,38 +39,38 @@ inlong-tubemq-server inlong-website ``` -## Environment Requirements +## 3 Environment Requirements - ZooKeeper 3.5+ - Hadoop 2.10.x 和 Hive 2.3.x - MySQL 5.7+ - Flink 1.9.x -## deploy InLong TubeMQ Server +## 4 deploy InLong TubeMQ Server [deploy InLong TubeMQ Server](modules/tubemq/quick_start.md) -## deploy InLong TubeMQ Manager +## 5 deploy InLong TubeMQ Manager [deploy InLong TubeMQ Manager](modules/tubemq/tubemq-manager/quick_start.md) -## deploy InLong Manager +## 6 deploy InLong Manager [deploy InLong Manager](modules/manager/quick_start.md) -## deploy InLong WebSite +## 7 deploy InLong WebSite [deploy InLong WebSite](modules/website/quick_start.md) -## deploy InLong Sort +## 8 deploy InLong Sort [deploy InLong Sort](modules/sort/quick_start.md) -## deploy InLong DataProxy +## 9 deploy InLong DataProxy [deploy InLong DataProxy](modules/dataproxy/quick_start.md) -## deploy InLong DataProxy-SDK +## 10 deploy InLong DataProxy-SDK [deploy InLong DataProxy](modules/dataproxy-sdk/quick_start.md) -## deploy InLong Agent +## 11 deploy InLong Agent [deploy InLong Agent](modules/agent/quick_start.md) -## Business configuration +## 12 Business configuration [How to configure a new business](docs/user_guide/user_manual) -## Data report verification +## 13 Data report verification At this stage, you can collect data through the file agent and verify whether the received data is consistent with the sent data in the specified Hive table. diff --git a/docs/user_guide/user_manual.md b/docs/user_guide/user_manual.md index 83bdf333f85..8f516a430cb 100644 --- a/docs/user_guide/user_manual.md +++ b/docs/user_guide/user_manual.md @@ -3,13 +3,13 @@ title: User Manual sidebar_position: 2 --- -# 1. User login +## 1 User login Requires the user to enter the account name and password of the system. ![](/cookbooks_img//image-1624433272455.png) -# 2. Data access +## 2 Data access The data access module displays a list of all tasks connected to the system within the current user authority, and can view, edit, update and delete the details of these tasks. @@ -18,9 +18,9 @@ Click [Data Access], there are two steps to fill in data access information: bus ![](/cookbooks_img//image-1624431177918.png) -## 2.1 Business Information +### 2.1 Business Information -### 2.1.1 Business Information +#### 2.1.1 Business Information You are required to fill in basic business information for access tasks. @@ -33,7 +33,7 @@ You are required to fill in basic business information for access tasks. information, add and modify all access configuration items - Business introduction: Cut SMS to introduce the business background and application of this access task: -### 2.1.2 Access requirements +#### 2.1.2 Access requirements Access requirements require users to choose message middleware: high throughput (TUBE): @@ -41,14 +41,14 @@ Access requirements require users to choose message middleware: high throughput High-throughput-Tube: high-throughput message transmission component, suitable for log message transmission. -### 2.1.3 Access scale +#### 2.1.3 Access scale The scale of access requires users to judge the scale of access data in advance, to allocate computing and storage resources later. ![](/cookbooks_img//image-1624431333949.png) -## 2.2 Data stream +### 2.2 Data stream Click [Next] to enter the data flow information filling step. There are four modules for data flow information filling: basic information, data source, data information, and data stream. @@ -57,7 +57,7 @@ In the data flow process, you can click [New Data Stream] to create a new data s ![](/cookbooks_img//image-1624431416449.png) -### 2.2.1 Basic information +#### 2.2.1 Basic information You are required to fill in the basic information of the data stream in the access task: @@ -70,7 +70,7 @@ You are required to fill in the basic information of the data stream in the acce configuration items - Introduction to data flow: simple text introduction to data flow -### 2.2.2 Data source +#### 2.2.2 Data source You are required to select the source of the data stream. @@ -83,7 +83,7 @@ be supplemented in the advanced options. ![](/cookbooks_img//image-1624431594406.png) -### 2.2.3 Data Information +#### 2.2.3 Data Information You are required to fill in the data-related information in the data stream. @@ -95,7 +95,7 @@ You are required to fill in the data-related information in the data stream. - Source field separator: the format of data sent to MQ - Source data field: attributes with different meanings divided by a certain format in MQ -### 2.2.4 Data storage +#### 2.2.4 Data storage You are required to select the final flow direction of this task, this part is not currently supports both hive storage and autonomous push. @@ -117,9 +117,9 @@ Add HIVE storage: - Field related information: source field name, source field type, HIVE field name, HIVE field type, field description, and support deletion and addition- -# 3. Access details +## 3 Access details -## 3.1 Execution log +### 3.1 Execution log When the status of the data access task is "approved successfully" or "configuration failed", the "execution log" function can be used to allow users to view the progress and details of the task. @@ -133,34 +133,34 @@ Click [Execution Log] to display the details of the task execution log in a pop- The execution log will display the task type, execution result, execution log content, end time, and the end time of the execution of the access process. If the execution fails, you can "restart" the task and execute it again. -## 3.2 Task details +### 3.2 Task details The business person in charge/following person can view the access details of the task, and can modify and update part of the information under the status of [Waiting Applying], [Configuration Successful], and [Configuration Failed]. There are three modules in the access task details: business information, data stream and data storage. -### 3.2.1 Business Information +#### 3.2.1 Business Information Display the basic business information in the access task, click [Edit] to modify part of the content ![](/cookbooks_img//image-1624432076857.png) -### 3.2.2 Data stream +#### 3.2.2 Data stream Display the basic information of the data flow under the access task, click [New Data Flow] to create a new data flow information ![](/cookbooks_img//image-1624432092795.png) -### 3.2.3 Data Storage +#### 3.2.3 Data Storage Display the basic information of the data flow in the access task, select different flow types through the drop-down box, and click [New Flow Configuration] to create a new data storage. ![](/cookbooks_img//image-1624432114765.png) -# 4. Data consumption +## 4 Data consumption Data consumption currently does not support direct consumption access to data, and data can be consumed normally after the approval process. @@ -170,7 +170,7 @@ consumption. ![](/cookbooks_img//image-1624432235900.png) -## 4.1 Consumer Information +### 4.1 Consumer Information Applicants need to gradually fill in the basic consumer business information related to data consumption applications in the information filling module @@ -190,40 +190,40 @@ the information filling module their own consumption scenarios After completing the information, click [Submit], and the data consumption process will be formally submitted to the approver before it will take effect. -# 5. Approval management +## 5 Approval management The approval management function module currently includes my application and my approval, and all tasks of data access and consumption application approval in the management system. -## 5.1 My application +### 5.1 My application Display the current task list submitted by the applicant for data access and consumption in the system, click [Details] to view the current basic information and approval process of the task. ![](/cookbooks_img//image-1624432445002.png) -### 5.1.1 Data access details +#### 5.1.1 Data access details Data access task detailed display The current basic information of the application task includes: applicant-related information, basic information about application access, and current approval process nodes. ![](/cookbooks_img//image-1624432458971.png) -### 5.1.2 Data consumption details +#### 5.1.2 Data consumption details Data consumption task details display basic information of current application tasks including: applicant information, basic consumption information, and current approval process nodes. ![](/cookbooks_img//image-1624432474526.png) -## 5.2 My approval +### 5.2 My approval As a data access officer and system member with approval authority, have the responsibility for data access or consumption approval. ![](/cookbooks_img//image-1624432496461.png) -### 5.2.1 Data Access Approval +#### 5.2.1 Data Access Approval New data access approval: currently it is a first-level approval, which is approved by the system administrator. @@ -232,7 +232,7 @@ business information. ![](/cookbooks_img//image-1624432515850.png) -### 5.2.2 New data consumption approval +#### 5.2.2 New data consumption approval New data consume approval: currently it is a first-level approval, which is approved by the person in charge of the business. @@ -242,13 +242,13 @@ requirements according to the access information: ![](/cookbooks_img//image-1624432535541.png) -# 6. System Management +## 6 System Management Only users with the role of system administrator can use this function. They can create, modify, and delete users: ![](/cookbooks_img//image-1624432652141.png) -## 6.1 New user +### 6.1 New user Users with system administrator rights can create new user accounts @@ -262,13 +262,13 @@ Users with system administrator rights can create new user accounts -Effective duration: the account can be used in the system ![](/cookbooks_img//image-1624432740241.png) -## 6.2 Delete user +### 6.2 Delete user The system administrator can delete the account of the created user. After the deletion, the account will stop using: ![](/cookbooks_img//image-1624432759224.png) -## 6.3 User Edit +### 6.3 User Edit The system administrator can modify the created account: @@ -278,7 +278,7 @@ The system administrator can modify the account type and effective duration to p ![](/cookbooks_img//image-1624432797226.png) -## 6.4 Change password +### 6.4 Change password The user can modify the account password, click [Modify Password], enter the old password and the new password, after confirmation, the new password of this account will take effect: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md index e7b65aa6f35..677898f8fa0 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md @@ -1,7 +1,7 @@ --- title: 架构介绍 --- -## 一. InLong-Agent 概览 +## 1 InLong-Agent 概览 InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括file、sql、Binlog、metrics等多种异构数据源之间稳定高效的数据采集功能。 ### 简要的架构图如下: @@ -15,7 +15,7 @@ InLong-Agent是一个支持多种数据源类型的收集工具,致力于实 ### 当前使用现状 InLong-Agent在腾讯集团内被广泛使用,承担了大部分的数据采集业务,线上数据量达百亿级别。 -## 二. InLong-Agent 架构介绍 +## 2 InLong-Agent 架构介绍 InLong Agent本身作为数据采集框架,采用channel + plugin架构构建。将数据源读取和写入抽象成为Reader/Writer插件,纳入到整个框架中。 + Reader:Reader为数据采集模块,负责采集数据源的数据,将数据发送给channel。 @@ -23,7 +23,7 @@ InLong Agent本身作为数据采集框架,采用channel + plugin架构构建 + Channel:Channel用于连接reader和writer,作为两者的数据传输通道,并起到了数据的写入读取监控作用 -## 三. InLong-Agent 采集分类说明 +## 3 InLong-Agent 采集分类说明 ### 3.1 文件采集 文件采集包含如下功能: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md index 714c318e7d6..a5bff20642a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md @@ -2,14 +2,14 @@ title: 编译部署 --- -## 1、配置 +## 1 配置 ``` cd inlong-agent ``` agent 支持本地运行以及线上运行,其中线上运行从inlong manager拉取任务,本地运行可使用http请求提交任务 -### Agent 线上运行相关设置 +### 1.1 Agent 线上运行相关设置 线上运行需要从inlong-manager拉取配置,配置conf/agent.properties如下: ```ini @@ -19,16 +19,16 @@ agent.manager.vip.http.host=manager web host agent.manager.vip.http.port=manager web port ``` -## 2、运行 +## 2 运行 解压后如下命令运行 ```bash sh agent.sh start ``` -### 3 实时添加job配置 +## 3 实时添加job配置 -#### 3.1 agent.properties 修改下面两处 +### 3.1 agent.properties 修改下面两处 ```ini # whether enable http service @@ -37,7 +37,7 @@ agent.http.enable=true agent.http.port=可用端口 ``` -#### 3.2 执行如下命令: +### 3.2 执行如下命令: ```bash curl --location --request POST 'http://localhost:8008/config/job' \ @@ -76,7 +76,7 @@ curl --location --request POST 'http://localhost:8008/config/job' \ - proxy.groupId: 写入proxy时使用的groupId,groupId是指manager界面中,数据接入中业务信息的业务ID,此处不是创建的tube topic名称 - proxy.streamId: 写入proxy时使用的streamId,streamId是指manager界面中,数据接入中数据流的数据流ID -## 4、可支持的路径配置方案 +## 4 可支持的路径配置方案 例如: /data/inlong-agent/test.log //代表读取inlong-agent文件夹下的的新增文件test.log @@ -85,7 +85,7 @@ curl --location --request POST 'http://localhost:8008/config/job' \ /data/inlong-agent/^\\d+(\\.\\d+)? // 以一个或多个数字开头,之后可以是.或者一个.或多个数字结尾,?代表可选,可以匹配的实例:"5", "1.5" 和 "2.21" -## 5、支持从文件名称中获取数据时间 +## 5 支持从文件名称中获取数据时间 Agent支持从文件名称中获取时间当作数据的生产时间,配置说明如下: /data/inlong-agent/***YYYYMMDDHH*** @@ -141,7 +141,7 @@ curl --location --request POST 'http://localhost:8008/config/job' \ ``` -## 6、支持时间偏移量offset读取 +## 6 支持时间偏移量offset读取 在配置按照时间读取之后,如果想要读取当前时间之外的其他时间的数据,可以通过配置时间偏移量完成 配置job属性名称为job.timeOffset,值为数字 + 时间维度,时间维度包括天和小时 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md index 40a8a59a184..ea69a876155 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md @@ -1,7 +1,7 @@ --- title: 架构介绍 --- -# 一、说明 +## 1 说明 在业务使用消息接入方式时,业务一般仅需将数据按照DataProxy可识别的格式(如六段协议、数字化协议等) 进行组包发送,就可以将数据接入到inlong。但为了保证数据可靠性、负载均衡、动态更新proxy列表等安全特性 @@ -9,9 +9,9 @@ title: 架构介绍 API的设计初衷就是为了简化用户接入,承担部分可靠性相关的逻辑。用户通过在服务送程序中集成API后,即可将数据发送到DataProxy,而不用关心组包格式、负载均衡等逻辑。 -# 二、功能说明 +## 2 功能说明 -## 2.1 整体功能说明 +### 2.1 整体功能说明 | 功能 | 详细描述 | | ---- | ---- | @@ -23,9 +23,9 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关 | DataProxy列表持久化(新) | 根据业务id对DataProxy列表持久化,防止程序启动时配置中心发生故障无法发送数据 -## 2.2 数据发送功能说明 +### 2.2 数据发送功能说明 -### 同步批量函数 +#### 同步批量函数 public SendResult sendMessage(List bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit) @@ -35,7 +35,7 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关 -###同步单条函数 +#### 同步单条函数 public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit) @@ -45,7 +45,7 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关 -###异步批量函数 +#### 异步批量函数 public void asyncSendMessage(SendMessageCallback callback, List bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit) @@ -54,7 +54,7 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关 SendMessageCallback 是处理消息的callback。bodyList为用户需要发送的多条数据的集合,多条数据的总长度建议小于512k。groupId是业务id,streamId是接口id。dt表示该数据的时间戳,精确到毫秒级别。也可直接设置为0,此时api会后台获取当前时间作为其时间戳。timeout和timeUnit是发送数据的超时时间,一般建议设置成20s。 -###异步单条函数 +#### 异步单条函数 public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md index 23fe487329f..272e0a8b82f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md @@ -1,7 +1,7 @@ --- title: 架构介绍 --- -# 一、说明 +## 1 说明 InLong-dataProxy属于inlong proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式 InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者 @@ -9,7 +9,7 @@ title: 架构介绍 InLong-dataProxy整体架构基于Apache Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。 -# 二、架构 +## 2 架构 ![](img/architecture.png) @@ -18,7 +18,7 @@ title: 架构介绍 3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube) -# 三、DataProxy功能配置说明 +## 3 DataProxy功能配置说明 DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同: @@ -157,7 +157,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000 缓存最大个数 ``` -# 4、监控指标配置说明 +## 4 监控指标配置说明 DataProxy提供了JMX方式的监控指标Listener能力,用户可以实现MetricListener接口,注册后可以定期接收监控指标,用户选择将指标上报自定义的监控系统。Source和Sink模块可以通过将指标数据统计到org.apache.inlong.commons.config.metrics.MetricItemSet的子类中,并注册到MBeanServer。用户自定义的MetricListener通过JMX方式收集指标数据并上报到外部监控系统 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md index 18e7df103ae..72eaacd4dbb 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md @@ -1,11 +1,11 @@ --- title: 编译部署 --- -## 部署 DataProxy +## 1 部署 DataProxy 所有的安装文件都在 `inlong-dataproxy` 目录下。 -### 配置tube地址和端口号 +### 1.1 配置tube地址和端口号 `tubemq_master_list`是TubeMQ master rpc地址,多个逗号分隔。 ``` @@ -14,13 +14,13 @@ $ sed -i 's/TUBE_LIST/tubemq_master_list/g' conf/flume.conf 注意conf/flume.conf中FLUME_HOME为proxy的中间数据文件存放地址 -### 环境准备 +### 1.2 环境准备 ``` sh prepare_env.sh ``` -### 配置manager地址 +### 1.3 配置manager地址 配置文件:`conf/common.properties`: ``` @@ -28,19 +28,19 @@ sh prepare_env.sh manager_hosts=ip:port ``` -## 启动 +## 2 启动 ``` sh bin/start.sh ``` -## 检查启动状态 +## 3 检查启动状态 ``` telnet 127.0.0.1 46801 ``` -## 将 DataProxy 配置添加到 InLong-Manager +## 4 将 DataProxy 配置添加到 InLong-Manager 安装完 DataProxy 后,需要将 DataProxy 所在主机的 IP 插入到 InLong-Manager 的后台数据库中。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md index 9a01a352177..76503660cf8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md @@ -2,7 +2,7 @@ title: 架构介绍 --- -## Apache InLong Manager介绍 +## 1 Apache InLong Manager介绍 + 目标定位:Apache inlong 定位为一站式数据接入解决方案,提供完整覆盖大数据接入场景从数据采集、传输、分拣、落地的技术能力。 @@ -10,12 +10,12 @@ title: 架构介绍 + Apache InLong Manager作为整个数据接入平台面向用户的统一管理入口,用户登录后会根据对应角色提供不同的功能权限以及数据权限。页面提供平台各基础集群(如mq、分拣)维护入口,可随时查看维护基本信息、容量规划调整。同时业务用户可完成数据接入任务的创建、修改维护、指标查看、接入对账等功能。其对应的后台服务在用户创建并启动任务的同时会与底层各模块进行数据交互,将各模块需要执行的任务通过合理的方式下发。起到协调串联后台业务执行流程的作用。 -## Architecture +## 2 Architecture ![](img/inlong-manager.png) -## 模块分工 +## 3 模块分工 | 模块 | 职责 | | :-----| :---- | @@ -25,9 +25,9 @@ title: 架构介绍 | manager-web | 前端交互对应接口 | | manager-workflow-engine | 工作流引擎| -## 系统使用流程 +## 4 系统使用流程 ![](img/interactive.jpg) -## 数据模型 +## 5 数据模型 ![](img/datamodel.jpg) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md index 2ca2d38ca1b..2de98be3388 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md @@ -2,7 +2,7 @@ title: 编译部署 --- -# 1. 环境准备 +## 1 环境准备 - 安装并启动 MySQL 5.7+,把 inlong-manager 模块中的 `doc/sql/apache_inlong_manager.sql` 文件拷贝到 MySQL 数据库所在的服务器 (比如拷贝到 `/data/` 目录下),通过下述命令加载此文件,完成表结构及基础数据的初始化: @@ -22,15 +22,15 @@ title: 编译部署 - 参照 [编译部署TubeMQ Manager](https://inlong.apache.org/zh-cn/docs/modules/tubemq/tubemq-manager/quick_start.html),安装并启动 TubeManager。 -# 2. 部署、启动 manager-web +## 2 部署、启动 manager-web **manager-web 是与前端页面交互的后台服务。** -## 2.1 准备安装文件 +### 2.1 准备安装文件 安装文件在 `inlong-manager-web` 目录下。 -## 2.2 修改配置 +### 2.2 修改配置 前往 `inlong-manager-web` 目录,修改 `conf/application.properties` 文件: @@ -70,7 +70,7 @@ spring.profiles.active=dev sort.appName=inlong_app ``` -## 2.3 启动服务 +### 2.3 启动服务 进入解压后的目录,执行 `sh bin/startup.sh` 启动服务,查看日志 `tailf log/manager-web.log`,若出现类似下面的日志,说明服务启动成功: @@ -78,7 +78,7 @@ spring.profiles.active=dev Started InLongWebApplication in 6.795 seconds (JVM running for 7.565) ``` -# 3. 服务访问验证 +## 3 服务访问验证 在浏览器中访问如下地址,验证 manager-web 服务: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md index 3b4404cf87e..7d24753b8b8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md @@ -7,36 +7,36 @@ inlong-sort是一个基于flink的ETL系统,支持多种数据源,支持简 inlong-sort依赖inlong-manager进行系统元数据的管理,元数据依赖zk进行存储及同步。 # 特性 -## 多租户系统 +## 1 多租户系统 inlong-sort支持多租户,一个inlong-sort的作业中可以包含多个同构的数据源,以及多个同构的存储系统。 并且针对不同的数据源,可以定义不同的数据格式以及字段抽取方式。 多租户系统依赖inlong-manager的元数据管理,用户只需要在inlong-manager的前端页面进行相应的配置,即可实现。 举例:以tubemq为source,hive为存储为例,同一个inlong-sort的作业可以订阅多个topic的tubemq数据,并且每个topic的数据可以写入不同的hive集群。 -## 支持热更新元数据 +## 2 支持热更新元数据 inlong-sort支持热更新元数据,比如更新数据源的信息,数据schema,或者写入存储系统的信息。 需要注意的是,当前修改数据源信息时,可能会造成数据丢失,因为修改数据源信息后,系统会认为这是一个全新的subscribe,会默认从消息队列的最新位置开始消费。 修改数据schema,抽取字段规则以及写入存储的信息,不会造成任何数据丢失,保证exactly-once -# 支持的数据源 +## 3 支持的数据源 - inlong-tubemq - pulsar -# 支持的存储系统 +## 4 支持的存储系统 - hive(当前只支持parquet文件格式) - clickhouse -# 一些局限 +## 5 一些局限 当前inlong-sort在ETL的transform阶段,只支持简单的字段抽取功能,一些复杂功能暂不支持。 -# 未来规划 -## 支持更多种类的数据源 +## 6 未来规划 +### 6.1 支持更多种类的数据源 kafka等 -## 支持更多种类的存储 +### 6.2 支持更多种类的存储 Hbase,Elastic Search等 -## 支持更多种写入hive的文件格式 +### 6.3 支持更多种写入hive的文件格式 sequece file,orc \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md index c5504d5513d..f5c48ef2116 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md @@ -7,7 +7,7 @@ title: Zookeeper配置介绍 inlong-sort与inlong-manager之间通过zk进行元数据的交互。 -# Zookeeper结构 +## 1 Zookeeper结构 ![img.png](img.png) @@ -21,5 +21,5 @@ dataflow代表一个具体的流向,每个流向有一个全局唯一的id来 元数据管理逻辑可以查看类`org.apache.inlong.sort.meta.MetaManager` -# 协议设计 +## 2 协议设计 具体的协议可以查看类`org.apache.inlong.sort.protocol.DataFlowInfo` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md index 334dd52b809..fc032230687 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md @@ -2,7 +2,7 @@ title: 编译部署 --- -## 配置flink运行环境 +## 1 配置flink运行环境 当前inlong-sort是基于flink的一个应用,因此运行inlong-sort应用前,需要准备好flink环境。 [如何配置flink环境](https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/cluster_setup.html "how to set up flink environment") @@ -11,10 +11,10 @@ title: 编译部署 flink环境配置完成后,可以通过浏览器访问flink的web ui,对应的地址是`/{flink部署路径}/conf/masters`文件中的地址 -## 准备安装文件 +## 2 准备安装文件 安装文件在`inlong-sort`目录。 -## 启动inlong-sort应用 +## 3 启动inlong-sort应用 有了上述编译阶段产出的jar包后,就可以启动inlong-sort的应用了。 [如何提交flink作业](https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/yarn_setup.html#submit-job-to-flink "如何提交flink作业") @@ -29,7 +29,7 @@ flink环境配置完成后,可以通过浏览器访问flink的web ui,对应 - `inlong-sort-core-1.0-SNAPSHOT.jar` 为编译阶段产出的jar包 -## 必要的配置 +## 4 必要的配置 - `--cluster-id ` 用来唯一标识一个inlong-sort作业 - `--zookeeper.quorum` zk quorum - `--zookeeper.path.root` zk根目录 @@ -40,7 +40,7 @@ flink环境配置完成后,可以通过浏览器访问flink的web ui,对应 `--cluster-id my_application --zookeeper.quorum 192.127.0.1:2181 --zookeeper.path.root /zk_root --source.type tubemq --sink.type hive` -## 所有支持的配置 +## 5 所有支持的配置 | 配置名 | 是否必须 | 默认值 |描述 | | ------------ | ------------ | ------------ | ------------ | |cluster-id | Y | NA | 用来唯一标识一个inlong-sort作业 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md index 66fb611e45c..a767e4ef59e 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md @@ -150,11 +150,11 @@ public class DefaultMessageListener implements MessageListener { ``` -### 3 创建Producer: +## 3 创建Producer: 现网环境中业务的数据都是通过代理层来做接收汇聚,包装了比较多的异常处理,大部分的业务都没有也不会接触到TubeSDK的Producer类,考虑到业务自己搭建集群使用TubeMQ进行使用的场景,这里提供对应的使用demo,见包org.apache.inlong.tubemq.example.MessageProducerExample类文件供参考,**需要注意**的是,业务除非使用数据平台的TubeMQ集群做MQ服务,否则仍要按照现网的接入流程使用代理层来进行数据生产: -#### 3.1 初始化MessageProducerExample类: +### 3.1 初始化MessageProducerExample类: 和Consumer的初始化类似,也是构造了一个封装类,定义了一个会话工厂,以及一个Producer类,生产端的会话工厂初始化通过TubeClientConfig类进行,如之前所介绍的,ConsumerConfig类是TubeClientConfig类的子类,虽然传入参数不同,但会话工厂是通过TubeClientConfig类完成的初始化处理: @@ -182,7 +182,7 @@ public final class MessageProducerExample { ``` -#### 3.2 发布Topic: +### 3.2 发布Topic: ```java public void publishTopics(List topicList) throws TubeClientException { @@ -191,7 +191,7 @@ public void publishTopics(List topicList) throws TubeClientException { ``` -#### 3.3 进行数据生产: +### 3.3 进行数据生产: 如下所示,则为具体的数据构造和发送逻辑,构造一个Message对象后调用sendMessage()函数发送即可,有同步接口和异步接口选择,依照业务要求选择不同接口;需要注意的是该业务根据不同消息调用message.putSystemHeader()函数设置消息的过滤属性和发送时间,便于系统进行消息过滤消费,以及指标统计用。完成这些,一条消息即被发送出去,如果返回结果为成功,则消息被成功的接纳并且进行消息处理,如果返回失败,则业务根据具体错误码及错误提示进行判断处理,相关错误详情见《TubeMQ错误信息介绍.xlsx》: @@ -218,7 +218,7 @@ public void sendMessageAsync(int id, long currtime, String topic, byte[] body, M ``` -#### 3.5 Producer不同类MAMessageProducerExample关注点: +### 3.4 Producer不同类MAMessageProducerExample关注点: 该类初始化与MessageProducerExample类不同,采用的是TubeMultiSessionFactory多会话工厂类进行的连接初始化,该demo提供了如何使用多会话工厂类的特性,可以用于通过多个物理连接提升系统吞吐量的场景(TubeMQ通过连接复用模式来减少物理连接资源的使用),恰当使用可以提升系统的生产性能。在Consumer侧也可以通过多会话工厂进行初始化,但考虑到消费是长时间过程处理,对连接资源的占用比较小,消费场景不推荐使用。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md index e806f92c05b..f4925b2fd8b 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md @@ -1,7 +1,7 @@ --- title: 快速开始 --- -## 部署运行 +## 1 部署运行 ### 1.1 配置示例 TubeMQ 集群包含有两个组件: **Master** 和 **Broker**. Master 和 Broker 可以部署在相同或者不同的节点上,依照业务对机器的规划进行处理。我们通过如下3台机器搭建有2台Master的生产、消费的集群进行配置示例: @@ -126,8 +126,8 @@ Broker启动前,首先要在Master上配置Broker元数据,增加Broker相 刷新页面可以看到 Broker 已经注册,当 `当前运行子状态` 为 `idle` 时, 可以增加topic: ![Add Broker 3](img/tubemq-add-broker-3.png) -## 3 快速使用 -### 3.1 新增 Topic +## 2 快速使用 +### 2.1 新增 Topic 可以通过 web GUI 添加 Topic, 在 `Topic列表`页面添加,需要填写相关信息,比如增加`demo` topic: ![Add Topic 1](img/tubemq-add-topic-1.png) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md index 41a0e4eb786..b4d551f8853 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md @@ -1,7 +1,7 @@ -## 部署TubeMQ Manager +## 1 部署TubeMQ Manager 安装文件在inlong-tubemq-manager目录. -### 配置 +### 1.1 配置 - 在mysql中创建`tubemanager`数据和相应用户. - 在conf/application.properties中添加mysql信息: @@ -12,13 +12,13 @@ spring.datasource.username=mysql_username spring.datasource.password=mysql_password ``` -### 启动服务 +### 1.2 启动服务 ``` bash $ bin/start-manager.sh ``` -### 初始化TubeMQ集群 +### 1.3 初始化TubeMQ集群 vim bin/init-tube-cluster.sh @@ -38,7 +38,7 @@ sh bin/init-tube-cluster.sh ``` 如上操作会创建一个clusterId为1的tube集群,注意该操作只进行一次,之后重启服务无需新建集群 -### 附录:其它操作接口 +### 1.4 附录:其它操作接口 #### cluster 查询clusterId以及clusterName全量数据 (get) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md index adebec80f76..46273206b5c 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md @@ -176,22 +176,22 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思 ![](img/perf_scenario_8_index.png) ## 6 附录 -## 6.1 附录1 不同机型下资源占用情况图: -### 6.1.1 【BX1机型测试】 +### 6.1 附录1 不同机型下资源占用情况图: +#### 6.1.1 【BX1机型测试】 ![](img/perf_appendix_1_bx1_1.png) ![](img/perf_appendix_1_bx1_2.png) ![](img/perf_appendix_1_bx1_3.png) ![](img/perf_appendix_1_bx1_4.png) -### 6.1.2 【CG1机型测试】 +#### 6.1.2 【CG1机型测试】 ![](img/perf_appendix_1_cg1_1.png) ![](img/perf_appendix_1_cg1_2.png) ![](img/perf_appendix_1_cg1_3.png) ![](img/perf_appendix_1_cg1_4.png) -## 6.2 附录2 多Topic测试时的资源占用情况图: +### 6.2 附录2 多Topic测试时的资源占用情况图: -### 6.2.1 【100个topic】 +#### 6.2.1 【100个topic】 ![](img/perf_appendix_2_topic_100_1.png) ![](img/perf_appendix_2_topic_100_2.png) ![](img/perf_appendix_2_topic_100_3.png) @@ -202,7 +202,7 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思 ![](img/perf_appendix_2_topic_100_8.png) ![](img/perf_appendix_2_topic_100_9.png) -### 6.2.2 【200个topic】 +#### 6.2.2 【200个topic】 ![](img/perf_appendix_2_topic_200_1.png) ![](img/perf_appendix_2_topic_200_2.png) ![](img/perf_appendix_2_topic_200_3.png) @@ -213,7 +213,7 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思 ![](img/perf_appendix_2_topic_200_8.png) ![](img/perf_appendix_2_topic_200_9.png) -### 6.2.3 【500个topic】 +#### 6.2.3 【500个topic】 ![](img/perf_appendix_2_topic_500_1.png) ![](img/perf_appendix_2_topic_500_2.png) ![](img/perf_appendix_2_topic_500_3.png) @@ -224,7 +224,7 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思 ![](img/perf_appendix_2_topic_500_8.png) ![](img/perf_appendix_2_topic_500_9.png) -### 6.2.4 【1000个topic】 +#### 6.2.4 【1000个topic】 ![](img/perf_appendix_2_topic_1000_1.png) ![](img/perf_appendix_2_topic_1000_2.png) ![](img/perf_appendix_2_topic_1000_3.png) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md index 9d8441f1090..157bb07ebb2 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md @@ -2,20 +2,20 @@ title: 编译部署 --- -## 关于 WebSite +## 1 关于 WebSite WebSite[Apache InLong incubator](https://github.com/apache/incubator-inlong)的管控端。 -## 编译 +## 2 编译 ``` mvn package -DskipTests -Pdocker -pl inlong-website ``` -## 运行 +## 3 运行 ``` docker run -d --name website -e MANAGER_API_ADDRESS=127.0.0.1:8083 -p 80:80 inlong/website ``` -## 开发指引 +## 4 开发指引 确认 `nodejs >= 12.0` 已经安装。 @@ -34,14 +34,14 @@ web服务器的启动依赖于后端服务 `manger api` 接口。 您应该先启动后端服务器,然后将 `/inlong-website/src/setupProxy.js` 中的变量`target` 设置为api服务的地址。 -### 测试 +### 4.1 测试 运行 `npm test` 或 `yarn test` 在交互式观察模式下启动测试运行器。 有关更多信息,请参阅有关 [运行测试](https://create-react-app.dev/docs/running-tests/) 的部分。 -### 构建 +### 4.2 构建 首先保证项目已运行过 `npm install` 或 `yarn install` 安装了 `node_modules`。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md index d211033baaf..ae65018f1db 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md @@ -6,18 +6,18 @@ sidebar_position: 3 本节用一个简单的示例,帮助您使用 Docker 快速体验 InLong 的完整流程。 -## 安装 Hive +## 1 安装 Hive Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐使用 Docker 进行快速安装,详情可见 [这里](https://github.com/big-data-europe/docker-hive)。 > 注意,如果使用以上 Docker 镜像的话,我们需要在 namenode 中添加一个端口映射 `8020:8020`,因为它是 HDFS DefaultFS 的端口,后面在配置 Hive 时需要用到。 -## 安装 InLong +## 2 安装 InLong 在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式: 1. 按照 [这里的说明](https://github.com/apache/incubator-inlong/tree/master/docker/docker-compose),使用 Docker 进行快速部署。(推荐) 2. 按照 [这里的说明](./quick_start.md),使用二进制包依次安装各组件。 -## 新建接入 +## 3 新建接入 部署完毕后,首先我们进入 “数据接入” 界面,点击右上角的 “新建接入”,新建一条接入,按下图所示填入业务信息 Create Business @@ -40,12 +40,12 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐 然后点击“提交审批”按钮,该接入就会创建成功,进入审批状态。 -## 审批接入 +## 4 审批接入 进入“审批管理”界面,点击“我的审批”,将刚刚申请的接入通过。 到此接入就已经创建完毕了,我们可以在 Hive 中看到相应的表已经被创建,并且在 TubeMQ 的管理界面中可以看到相应的 topic 已经创建成功。 -## 配置 agent +## 5 配置 agent 然后我们使用 docker 进入 agent 容器内,创建相应的 agent 配置。 ``` $ docker exec -it agent sh diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md index 06db88f92af..934515b91da 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md @@ -5,7 +5,7 @@ sidebar_position: 1 本节包含快速入门指南,可帮助您开始使用 Apache InLong。 -## 整体架构 +## 1 整体架构 Apache InLong [Apache InLong](https://inlong.apache.org)(incubating) 整体架构如上,该组件是一站式数据流媒体平台,提供自动化、安全、分布式、高效的数据发布和订阅能力,帮助您轻松构建基于流的数据应用程序。 @@ -14,7 +14,7 @@ InLong(应龙)是中国神话故事里的神兽,可以引流入海,借喻InL InLong(应龙) 最初建于腾讯,服务线上业务8年多,支持大数据场景下的海量数据(每天40万亿条数据规模以上)报表服务。整个平台集成了数据采集、汇聚、缓存、分拣和管理模块等共5个模块,通过这个系统,业务只需要提供数据源、数据服务质量、数据落地集群和数据落地格式,即数据可以源源不断地将数据从源集群推送到目标集群,极大满足了业务大数据场景下的数据上报服务需求。 -## 编译 +## 2 编译 - Java [JDK 8](https://adoptopenjdk.net/?variant=openjdk8) - Maven 3.6.1+ @@ -38,39 +38,39 @@ inlong-tubemq-server inlong-website ``` -## 环境要求 +## 3 环境要求 - ZooKeeper 3.5+ - Hadoop 2.10.x 和 Hive 2.3.x - MySQL 5.7+ - Flink 1.9.x -## 部署InLong TubeMQ Server +## 4 部署InLong TubeMQ Server [部署InLong TubeMQ Server](modules/tubemq/quick_start.md) -## 部署InLong TubeMQ Manager +## 5 部署InLong TubeMQ Manager [部署InLong TubeMQ Manager](modules/tubemq/tubemq-manager/quick_start.md) -## 部署InLong Manager +## 6 部署InLong Manager [部署InLong Manager](modules/manager/quick_start.md) -## 部署InLong WebSite +## 7 部署InLong WebSite [部署InLong WebSite](modules/website/quick_start.md) -## 部署InLong Sort +## 8 部署InLong Sort [部署InLong Sort](modules/sort/quick_start.md) -## 部署InLong DataProxy +## 9 部署InLong DataProxy [部署InLong DataProxy](modules/dataproxy/quick_start.md) -## 部署InLong DataProxy-SDK +## 10 部署InLong DataProxy-SDK [部署InLong DataProxy](modules/dataproxy-sdk/quick_start.md) -## 部署InLong Agent +## 11 部署InLong Agent [部署InLong Agent](modules/agent/quick_start.md) -## 业务配置 +## 12 业务配置 [配置新业务](docs/user_guide/user_manual) -## 数据上报验证 +## 13 数据上报验证 到这里,您就可以通过文件Agent采集数据并在指定的Hive表中验证接收到的数据是否与发送的数据一致。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md index 4613300165a..d6354ad32e6 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md @@ -3,13 +3,13 @@ title: 用户手册 sidebar_position: 2 --- -# 1. 用户登录 +## 1 用户登录 需系统使用用户输入账号名称和密码。 ![](/cookbooks_img/image-1624433272455.png) -# 2. 数据接入 +## 2 数据接入 数据接入模块展示目前用户权限内接入系统所有任务列表,可以对这些任务详情查看、编辑更新和删除操作。 @@ -17,9 +17,9 @@ sidebar_position: 2 ![](/cookbooks_img/image-1624431177918.png) -## 2.1 业务信息 +### 2.1 业务信息 -### 2.1.1 业务信息 +#### 2.1.1 业务信息 需要用户对接入任务填写基础业务信息。 @@ -30,7 +30,7 @@ sidebar_position: 2 - 业务责任人:至少2人,业务责任人可查看、修改业务信息,新增和修改所有接入配置项 - 业务介绍:剪短信对此次接入任务进行业务背景和应用介绍: -### 2.1.2 接入要求 +#### 2.1.2 接入要求 接入要求需要用户进行选择消息中间件:高吞吐(TUBE): @@ -38,13 +38,13 @@ sidebar_position: 2 高吞吐—Tube :高吞吐消息传输组件,适用于日志类的消息传递。 -### 2.1.3 接入规模 +#### 2.1.3 接入规模 接入规模需要用户预先针对接入数据进行规模判断,以便后续分配计算和存储资源。 ![](/cookbooks_img/image-1624431333949.png) -## 2.2 数据流 +### 2.2 数据流 点击【下一步】进入到数据流信息填写步骤,数据流信息填写有四个模块:基础信息、数据来源、数据信息、数据流向。 @@ -52,7 +52,7 @@ sidebar_position: 2 ![](/cookbooks_img/image-1624431416449.png) -### 2.2.1 基础信息 +#### 2.2.1 基础信息 需用户对该接入任务中数据流的基础信息进行填写: @@ -63,7 +63,7 @@ sidebar_position: 2 - 数据流责任人:数据流责任人可查看、修改数据流信息,新增和修改所有接入配置项 - 数据流介绍:数据流简单文本介绍 -### 2.2.2 数据来源 +#### 2.2.2 数据来源 需用户选择该数据流的消息来源,目前支持文件、自主推送三种方式,并且可以在高级选项中补充该数据来源详细信息: @@ -72,7 +72,7 @@ sidebar_position: 2 ![](/cookbooks_img/image-1624431594406.png) -### 2.2.3 数据信息 +#### 2.2.3 数据信息 需用户填写该数据流中数据相关信息: @@ -83,7 +83,7 @@ sidebar_position: 2 - 源字段分隔符:数据发送到 MQ 里的格式 - 源数据字段:数据在 MQ 里按某种格式划分的不同含义的属性 -### 2.2.4 数据流向 +#### 2.2.4 数据流向 需用户对此任务的流向终流向进行选择,此部分为非必填项,目前支持Hive和自主推送两种: @@ -103,9 +103,9 @@ HIVE流向: - JDBC url:hiveserver 的jdbcurl - 字段相关信息: 源字段名、源字段类型、HIVE字段名、HIVE字段类型、字段描述,并支持删除和新增字段 -# 3. 接入详情 +## 3 接入详情 -## 3.1 执行日志 +### 3.1 执行日志 当数据接入任务状态为”批准成功“和”配置失败“状态,可通过”执行日志“功能,以便用户查看任务执行进程进程和详情: @@ -117,34 +117,34 @@ HIVE流向: 执行日志中将展示该接入流程执行中任务类型、执行结果、执行日志内容、结束时间、如果执行失败可以”重启“该任务再次执行。 -## 3.2 任务详情 +### 3.2 任务详情 业务负责人/关注人可以查看该任务接入详情,并在【待提交】、【配置成功】、【配置失败】状态下可对部分信息进行修改更新接入任务详情中具有业务信息、数据流、流向三个模块。 -### 3.2.1 业务信息 +#### 3.2.1 业务信息 展示接入任务中基础业务信息,点击【编辑】可对部分内容进行修改更改: ![](/cookbooks_img/image-1624432076857.png) -### 3.2.2 数据流 +#### 3.2.2 数据流 展示该接入任务下数据流基础信息,点击【新建数据流】可新建一条数据流信息: ![](/cookbooks_img/image-1624432092795.png) -### 3.2.3 流向 +#### 3.2.3 流向 展示该接入任务中数据流向基础信息,通过通过下拉框选择不同流向类型,点击【新建流向配置】可新建一条数据流向: ![](/cookbooks_img/image-1624432114765.png) -# 4. 数据消费 +## 4 数据消费 数据消费目前不支持直接消费接入数据,需走数据审批流程后方可正常消费数据; 点击【新建消费】,进入数据消费流程,需要对消费信息相关信息进行填写: ![](/cookbooks_img/image-1624432235900.png) -## 4.1 消费信息 +### 4.1 消费信息 申请人需在该信息填写模块中逐步填写数据消费申请相关基础消费业务信息: @@ -160,35 +160,35 @@ HIVE流向: ![](/cookbooks_img/image-1624432286674.png) -# 5. 审批管理 +## 5 审批管理 审批管理功能模块目前包含了我的申请和我的审批,管理系统中数据接入和数据消费申请审批全部任务。 -## 5.1 我的申请 +### 5.1 我的申请 展示目前申请人在系统中数据接入、消费提交的任务列表,点击【详情】可以查看目前该任务基础信和审批进程: ![](/cookbooks_img/image-1624432445002.png) -### 5.1.1 数据接入详情 +#### 5.1.1 数据接入详情 数据接入任务详细展示目前该申请任务基础信息包括:申请人相关信息、申请接入基础信息,以及目前审批进程节点: ![](/cookbooks_img/image-1624432458971.png) -### 5.1.2 数据消费详情 +#### 5.1.2 数据消费详情 数据消费任务详情展示目前申请任务基础信息包括:申请人信息、基础消费信息,以及目前审批进程节点: ![](/cookbooks_img/image-1624432474526.png) -## 5.2 我的审批 +### 5.2 我的审批 作为具有审批权限的数据接入员和系统成员,具备对数据接入或者消费审批职责: ![](/cookbooks_img/image-1624432496461.png) -### 5.2.1 数据接入审批 +#### 5.2.1 数据接入审批 新建数据接入审批:目前为一级审批,由系统管理员审批。 @@ -196,7 +196,7 @@ HIVE流向: ![](/cookbooks_img/image-1624432515850.png) -### 5.2.2 新建数据消费审批 +#### 5.2.2 新建数据消费审批 新建数据消费审批:目前为一级审批,由业务负责人审批。 @@ -204,13 +204,13 @@ HIVE流向: ![](/cookbooks_img/image-1624432535541.png) -# 6. 系统管理 +## 6 系统管理 角色为系统管理员的用户才可以使用此功能,他们可以创建、修改、删除用户: ![](/cookbooks_img/image-1624432652141.png) -## 6.1 新建用户 +### 6.1 新建用户 具有系统管理员权限用户,可以进行创建新用户账号: @@ -223,13 +223,13 @@ HIVE流向: ![](/cookbooks_img/image-1624432740241.png) -## 6.2 删除用户 +### 6.2 删除用户 系统管理员可以对已创建的用户进行账户删除,删除后此账号将停止使用: ![](/cookbooks_img/image-1624432759224.png) -## 6.3 修改用户 +### 6.3 修改用户 系统管理员可以修改已创建的账号: @@ -239,7 +239,7 @@ HIVE流向: ![](/cookbooks_img/image-1624432797226.png) -## 6.4 更改密码 +### 6.4 更改密码 用户可以修改账号密码,点击【修改密码】,输入旧密码和新密码,确认后此账号新密码将生效: diff --git a/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md b/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md index a7d72f5dab0..7f4f93cb9bc 100644 --- a/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md +++ b/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md @@ -1,14 +1,14 @@ --- title: Architecture --- -# 1、intro +## 1、intro Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system. -# 2、architecture +## 2、architecture ![](img/architecture.png) @@ -16,7 +16,7 @@ title: Architecture 2. The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed. 3. The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here) -# 3、DataProxy support configuration instructions +## 3、DataProxy support configuration instructions DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume: